home *** CD-ROM | disk | FTP | other *** search
- From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>
-
-
- mod.std.c Digest Mon, 1 Jul 85 Volume 8 : Issue 3
-
- Today's Topics:
- (B.1.1.2, C.8) Use of Whitespace in the Preprocessor
- (B.2.2) Character display semantics
- (B.2.4.1) Translation limits
- (B.2.4.2, D.1, etc.) Quasi-reserved words.
- (C.8.2) Macro replacement
- (C.8.3) Conditional inclusion
- (D.) Operating-system defined values and C data types.
- (D.10.2) Rand
- (D.12.3.1) asctime
- (D.12.3.4) gmtime
- (D.3) Character Testing Functions
- (D.8) Variable arguments
- (D.9.9) File pointers
- ----------------------------------------------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (B.1.1.2, C.8) Use of Whitespace in the Preprocessor
- To: std-c@cbosgd
-
- Comments on ANSI C Draft Standard (X3J11/85-045, April 30, 1985).
- **
- ** Note: these are my personal comments and not necessarily those
- ** of my employer.
- **
-
- There seems to be some confusion (in my mind anyways) regarding
- the interaction of "tokenization" (sec B.1.1.2) and the preprocessor.
- According to the description of translation phases (phases 1 and 2
- are irrelevant to this discussion):
- Phase
- 3. Comments are replaced by one space character.
- 4. The source text is completely tokenized... Each sequence
- of other [not newline] white-space characters becomes a single
- white-space token; alternatively, each other white-space
- token becomes a unique token. [I don't understand this and
- aren't sure that "alternatives" are appropriate here.]
- 5. The source text is preprocessed.
-
- However, in the description of the preprocessor (sec C.8),
- "there may be any number of space and horizontal-tab characters
- between the # token and the identifier that constitutes the next
- token, and before the new-line character that terminates the directive."
-
- Some questions:
-
- 1. If I understand the translation phases correctly, there are no
- "space and horizontal-tab characters", but rather "white-space
- tokens." By explicitly specifying space and tab, and not
- specifying the other two white-space characters (vertical-tab
- and form-feed), it appears that the translation phase description
- should be changed as follows:
- 3. ... Each comment is replaced by one <SPACE> token. Space and
- horizontal-tab characters are replaced by <SPACE> tokens.
- Vertical-tab and form-feed characters are replaced by <FEED>
- tokens.
- 6. ... The preprocessing concatenation operation is applied and the
- full source is retokenized. <FEED> tokens become <SPACE> tokens.
-
- While this appears to clarify the intent of the Draft Standard,
- it would seem simpler to drop all distinction between the
- various white-space tokens. It would also eliminate confusion
- if the Draft Standard were to emphasize that comments may
- be placed anywhere in preprocessor command lines and may
- also cause such lines to extend over more than one physical
- input source line. For example, comments are legal in the
- following contexts:
- /*1*/ # /*2*/ include /*3*/ "filename" /*4*/
- and any of these comments may extend over multiple source lines.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (B.2.2) Character display semantics
- To: std-c@cbosgd
-
- Section B.2.2 states that
-
- "The effect of writing a printable character ... to a display device
- is to display a graphic representation of that character at the current
- printing position and then advance the printing position to the next
- position on the current line."
-
- I would recommend adding:
-
- "The effect of writing a printable character at the final printing
- position of a line is implementation defined."
-
- I would recommend deleting \a and \v as they offer no useful capability
- and cannot be implemented in an implementation-independent manner.
-
- I would recommend that, for all \ escapes except \n, the Standard
- provide a definition in terms of the draft ISO DIS 8859/1 (or the
- equivalent draft ANSI X3.134.2 and approved ECMA-94) 8-bit code
- standards, with reference to the earlier ANSI X3.4-1977, ISO 646,
- ISO dis 2022.2, etc. standards and that all actions be labeled
- "implementation dependent." \n should have the semantics described
- in (B.2.2) and the Draft Standard should note that the semantics of
- the ANSI line-feed code (code position 0/10) are implementation-dependent.
- (I.e, putchar('\n') is not necessarily equivalent to putchar(0xA)).
- this section might be rewritten in roughly the following manner:
-
- The preferred implementation character set for C is specified in
- draft ISO DIS 8859/1, called Latin-1 in this Draft Standard. If
- the implementation supports the Latin-1 character set, escape codes
- have the following representation:
-
- Escape ANSI code Action
- \a 0/7 Audible alert
- \b 0/8 Backspace
- \f 0/12 Form feed
- \r 0/13 Carriage return without advance to new line
- \t 0/9 Horizontal tab
- \v 0/11 Vertical tab
-
- If the implementation does not support the Latin-1 character set,
- the above escapes have implementation-dependent value and actions.
- They must, however, describe different character values (so that
- case statements will not fail.)
-
- It should be noted that on a display device implementing ANSI standard
- character set invocation and designation, that "putchar('a')" does not
- necessarily display the first character of the roman alphabet in lower
- case. The presentation layer of the display device will show its
- representation of the character currently invoked into GL at position
- 6/1. Actually, since devices can interpret information between the
- DCS, OSC, APC, or PM introducers and the ST delimiter in just about
- any way they please, it does not even promise to display anything!
-
- Should the standard (especially an ANSI one) promise more than
- sending the bit pattern to the display device? Perhaps "\n" should be
- special.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (B.2.4.1) Translation limits
- To: std-c@cbosgd
-
- Section B.2.4.1 states that "the implementation must be able to compile
- at least one program that meets or exceeds all of the following
- translations limits." Note, however, that a program with 15 nesting
- levels for compounds, 31 character identifiers, and 1024 identifiers
- in each nested block will require about 500,000 bytes of symbol table
- storage which is probably not feasible for any but the largest
- implementation. I would suggest removing "all" or changing "maximum
- number of identifiers with block scope in one block" to "... in one
- block and all of its parents.", if for no other reason than making it
- unnecessary for implementors to ignore this portion of the standard.
-
- The following translation limits seem unreasonably small:
-
- Conditional compilation nesting levels (6) -- I would recommend 16.
-
- Case labels in a switch (255) -- I would recommend at least 512
- and preferably 1024. The yacc grammar for Pascal offers an
- example of a large switch statement.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (B.2.4.2, D.1, etc.) Quasi-reserved words.
- To: std-c@cbosgd
-
- The Draft Standard has added a large number of #defined symbols
- and type definitions. For example, section B.2.4.2 adds 30 numerical
- limits. I would recommend that all new definitions (i.e. #defined
- symbols and type definitions that are not currently in widespread
- use) be specified with a leading _ so they cannot conflict with
- user code. I.e., the user should have a reasonable chance of defining
- variables can never conflict with reserved words. As it is now,
- there is no way that I can tell that a symbol does not conflict
- with a symbol in some library header file.
-
- I would further recommend that the symbols be composed of full English
- words, even if this means more typing. Thus "SHRT_MIN" should be
- "_SHORT_MIN".
-
- Similarly, I would recommend "readonly" rather than "const."
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (C.8.2) Macro replacement
- To: std-c@cbosgd
-
- The current Draft Standard added the following to C.8.2:
-
- If the identifier following the initial # in a directive has been
- defined as a macro name, the identifier is not replaced by an expansion
- of the macro.
-
- I think this means that if you have written
-
- #define foo endif
- #foo
-
- you don't get #endif, but an example would be helpful.
-
-
- (C.8.2) ## unclarities.
-
- It appears from my reading that the token created by ## concatenation
- cannot cause further macro expansion, but this is not clearly stated.
- For example, what is the result of the following:
-
- #define concat(a, b) a ## b
- #line 123
- int line = concat(__LI, NE__);
-
- is it
- line = __LINE__;
- or
- line = 123;
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (C.8.3) Conditional inclusion
- To: std-c@cbosgd
-
- #if directives verified for correctness?
-
- The Draft Standard now specifies that
-
- Directives are verified for correctness, but processed only to keep
- track of the level of nested conditionals. Does this mean that
-
- #if 0
- /* never compiled */
- #if )syntax error(
- #endif
- #endif
-
- should print a compiler error message? What about
-
- #undef never_defined
- #ifdef never_defined
- #define xyz(a) ((a) * 1)
- #endif
- ...
- #ifdef never_defined
- #if xyz(123)
- #endif
- #endif
-
- Should the use of xyz() result in an error message, or be replaced
- by zero (undefined preprocessor symbol) or what?
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.) Operating-system defined values and C data types.
- To: std-c@cbosgd
-
- Several functions (notably fseek/ftell and kill) take parameters
- that are defined in terms of C data types (the ftell result is
- a long, and kill takes an integer program identifier). This
- cannot be made to work correctly on many systems. For example,
- a process is specified on RSX-11M by a 3 (16-bit) word vector.
- Since the language supports passing structures to functions,
- I would recommend redefining these functions in terms of
- structures (whose contents are defined by appropriate #include
- files) and adding functions to perform implementation-dependent
- arithmetic in an implementation-dependent manner.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.10.2) Rand
- To: std-c@cbosgd
-
- The definition of rand semantics is reasonable for systems
- with 32-bit two's-complement long integers. It is not clear
- from Knuth (volume 2) that it will yield the same
- sequence of numbers for implementations with greater numeric
- precision or different arithmetic behavior.
-
- Unless, of course, rand is implemented as a function that reads
- a file of 2^32 pre-compiled integers. But, since the size of this
- file cannot be expressed as a long, srand cannot correctly reposition
- the file.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.12.3.1) asctime
- To: std-c@cbosgd
-
- In order to provide for orderly development of local-language
- variants of C, the alphabetic words in the returned string
- should be standardized to their current (English) values --
- which should be included in the Draft Standard. Alternatively,
- the Standard should explicitly state that the actual contents of
- these fields may be implementation dependent. I would prefer the former.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.12.3.4) gmtime
- To: std-c@cbosgd
-
- The standard should note that Greenwich Mean Time is more properly
- known as UTC (Universal Coordinated Time.)
-
- [ You mean UCT? -- Mod -- ]
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.3) Character Testing Functions
- To: std-c@cbosgd
-
- The "~" character has the hexadecimal value 0x7E, not 0xFE.
- The DEL character has the hexadecimal value 0x7F, not 0xFF. As
- noted above, they should be defined in terms of the Latin-1 alphabet
- and a particular set of presentation layer designations/invocations
- (ASCII_G in GL, Latin-1 in GR). This avoids all of the issues with NRC's.
-
- I would suggest that the C Standards Committee coordinate with
- the X3 character set committees.
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.8) Variable arguments
- To: std-c@cbosgd
-
- This section refers to section C.7.7.1, which doesn't exist.
-
- What is the behavior of an implementation when number and
- type of the arguments, as accessed by va_arg disagree with
- those of the actual function call?
-
- ------------------------------
-
- Date: Sun, 30 Jun 85 12:45:44 edt
- From: decvax!minow (Martin Minow)
- Subject: (D.9.9) File pointers
- To: std-c@cbosgd
-
- Defining the value returned by ftell() as a long cannot work
- on some implementations and unnecessarily limits the size of
- files on all implementations. I would recommend that ftell return
- an implementation-defined structure and that functions be
- provided to manipulate the values of such structures.
-
- ------------------------------
-
- End of mod.std.c Digest - Mon, 1 Jul 85 10:23:12 EDT
- ******************************